WO 2004/019230 



PCT/US2003/026025 



-20- 

What is Claimed: 

1 1. A method for generating structured document files from a document 

2 image, the method comprising the steps of: 

3 segmenting the document image into one or more zones, at least one of the 

4 one or more zones containing a respective text image; 

5 converting the respective text images within the at least one of the one or 

6 more zones to digital text; 

7 automatically identifying layout information for each of the one or more 

8 zones; 

9 labeling each of the one or more zones in accordance with a schema; and 

10 automatically associating mark-up language tags with the labeled zones to 
n generate the structured document files responsive to the identified layout information and 
12 a model file. 

1 2. The method of claim 1, wherein the model file is associated with the 

2 schema and wherein the labeling step comprises at least the steps of: 

3 automatically labeling each of the one or more zones responsive to the 

4 model file. 

1 3. The method of claim 1, further comprising the steps of: 

2 receiving editing commands corresponding to the one or more zones; and 

3 updating the one or more zones responsive to the editing commands. 

1 4. The method of claim 3, wherein the step of receiving editing 

2 commands includes the step of receiving text editing commands and the step of updating 

3 the one or more zones includes the step of editing the digital text responsive to the text 

4 editing commands. 

1 5. The method of claim 3, wherein the step of receiving editing 

2 commands includes the step of receiving segmenting commands and the step of updating 
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the one or more zones includes the step of updating characteristics of the one or more 
zones responsive to the segmenting commands. 

6. The method of claim 1, further comprising the step of: 
receiving editing commands corresponding to the schema; 
updating the schema responsive to the editing commands. 

7. The method of claim 1, wherein the respective text images are 
displayed on a graphical user interface (GUI) and wherein the converting step comprises at 
least the step of: 

overlaying the respective text images displayed on the GUI with the at least 
one of the one or more zones with the corresponding digital text. 

8. The method of claim 1, wherein the structured document files include 
an XML file and an XSL file for each document image and wherein the generating steps 
comprises at least the step of: 

formating the XSL file such that information corresponding to each of the 
labeled zones in the XML file is displayed in multiple layers on a web browser. 

9. The method of claim 1, wherein the steps of segmenting, converting, 
labeling, and automatically associating mark-up language tags are performed sequentially 
responsive to a selection of a workflow icon of a graphical user interface and wherein the 
method further comprises the step of: 

updating the workflow icon to represent a next step of the segmenting, 
converting, labeling, and automatically associating mark-up language tags to be 
performed, wherein the workflow icon presents a unique image corresponding to each 
step. 

10. A system for generating structured document files from a document 
image, the system comprising: 



means for segmenting the document image into one or more zones, at least 
one of the one or more zones containing a respective text image; 
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means for converting the respective text images within the at least one of 
the one or more zones to digital text; 

means for automatically identifying layout information for each of the one or 

more zones; 

means for labeling each of the one or more zones in accordance with a 

schema; and 

means for automatically associating mark-up language tags with the labeled 
zones to generate the structured document files responsive to the identified layout 
information and a model file. 

11. The system of claim 10, further comprising: 

means for receiving editing commands corresponding to the one or more 

zones; and 

means for updating the one or more zones responsive to the editing 

commands. 

12. The system of claim 10, further comprising: 

i means for receiving editing commands corresponding to the schema; 
means for updating the schema responsive to the editing commands. 

13. A structured mark-up language generator for generating structured 
document files from a document image, the generator comprising: 

a document processor that: a) segments the document image into one or 
more zones, at least one of the one or more zones containing a respective text image, b) 
identifies layout information for each of the one or more zones, and c) converts the 
respective text images within the at least one of the one or more zones to digital text; 

a labeler that labels each of the one or more zones in accordance with a 

schema; and 
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9 a structured document generator that generates the structured document 

10 files responsive to the identified layout information and a model file. 

1 14. The generator of claim 13, further comprising: 

2 an editor coupled to the document processor that enables editing of the 

3 digital text and the one or more zones. 

1 15. The generator of claim 13, further comprising: 

2 an editor coupled to the labeler that enables editing of the labels for each of 

3 the one or more zones. 

1 16. A graphical user interface (GUI) for generating structured document 

2 files from a document image, the GUI comprising: 

3 a document panel for displaying a document image; 

4 a schema panel for displaying a schema corresponding to the document 

5 image; and 

6 a workflow icon for directing the generation of at least one structured mark- 

7 up language document from the document image, the workflow icon reflecting a next step 

8 in a process to generate the at least one structured mark-up language document. 

1 17. The GUI of claim 16, wherein the process includes sequentially 

2 performing the steps of loading an image, segmenting the image into zones, converting 

3 text within the zones to digital text, labeling the zones, and generating the at least one 

4 structured document and wherein the workflow icon is updated during the process to 

5 present unique images corresponding to each step to be performed in the process. 

1 18. A computer readable medium including software that is configured to 

2 control a general purpose computer to implement a method for generating structured 

3 document files from a document image, the method comprising the steps of: 

4 segmenting the document image into one or more zones, at least one of the 
s one or more zones containing a respective text image; 
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converting the respective text images within the at least one of the one or 
more zones to digital text; 

automatically identifying layout information for each of the one or more 

zones; 

labeling each of the one or more zones in accordance with a schema; and 

automatically associating mark-up language tags with the labeled zones to 
generate the structured document files responsive to the identified layout information and 
a model file. 

19. The computer readable medium of claim 18, wherein the method 
implemented by the software configured general purpose computer further comprises: 

updating the one or more zones responsive to editing commands 
corresponding to the one or more zones. 

20. The computer readable medium of claim 18, wherein the method 
implemented by the software configured general purpose computer further comprises: 

updating the schema responsive to editing commands corresponding to the 

schema. 



